Time-Frequency Trade-offs for Audio Source Separation with Binary Masks
نویسنده
چکیده
The short-time Fourier transform (STFT) provides the foundation of binary-mask based audio source separation approaches. In computing a spectrogram, the STFT window size parameterizes the trade-off between time and frequency resolution. However, it is not yet known how this parameter affects the operation of the binary mask in terms of separation quality for real-world signals such as speech or music. Here, we demonstrate that the trade-off between time and frequency in the STFT, used to perform ideal binary mask separation, depends upon the types of source that are to be separated. In particular, we demonstrate that different window sizes are optimal for separating different combinations of speech and musical signals. Our findings have broad implications for machine audition and machine learning in general.
منابع مشابه
Music Remixing and Upmixing Using Source Separation
Current research on audio source separation provides tools to estimate the signals contributed by different instruments in polyphonic music mixtures. Such tools can be already incorporated in music production and post-production workflows. In this paper, we describe recent experiments where audio source separation is applied to remixing and upmixing existing mono and stereo music content. 1. AU...
متن کاملBlind Source Separation Using Mixtures of Alpha-Stable Distributions
We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability...
متن کاملInformed algorithms for sound source separation in enclosed reverberant environments
While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are “informed” i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by ...
متن کاملCombining Mask Estimates for Single Channel Audio Source Separation Using Deep Neural Networks
Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks ...
متن کاملA Source Reassignment Technique for Time-frequency Masking Audio Separation
A neighborhood-based source reassignment technique is proposed for being used on time-frequency masking audio source separation methods. This technique identifies all the time-frequency clusters that form the separation masks in the Short-Time Fourier Transform (STFT) domain, and labels each time-frequency bin with a value that denotes the size of their corresponding clusters. The bins correspo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1504.07372 شماره
صفحات -
تاریخ انتشار 2015